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Signature  table  training  method  consists  of  cumulative  evaluation  of  a  function 
(such  as  a  probability  density  at  pre-assigned  co-ordinate  values  of  input 
parameters  to  the  table.  The  training  is  conditional;  based  on  a  binary 
valued  ''learning":  input  to  a  table  which  is  compared  to  the  label  attached 
to  each  training  sample.  Interpretation  of  an  unknown  sample  vector  is  then 
equivalent  of  a  table  look-up,  i.e.  extraction  of  the  function  value  stored 
at  the  proper  co-ordinates.  Such  a  technique  is  very  useful  when  a  large 
number  of  samples  must  be  interpreted  as  in  the  case  of  speech  recognition 
and  the  time  required  for  the  training  as  well  as  for  the  recog"'  .on  is  at 
a  premium. 

However  this  method  is  limited  by  prohibitive  storage  requirements,  even  for 
a  moderate  number  of  parameters,  when  their  relative  independence  cannot  be 
assumed.  This  report  investigates  the  conditions  under  which  the  higher 
dimensional  probability  density  function  can  be  decomposed  so  that  the  density 
estimate  is  obtained  by  a  hierarchy  of  signature  tables  with  consequent 
reduction  in  the  storage  requirement. 

Practical  utility  of  the  theoretical  results  obtained  in  the  report  is 
demonstrated  by  a  vowel  recognition  experiment. 
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1.  Introduction 


Signature  table  training  lias  been  described  by  Samuel  113  where  it  is  used 
in  a  program  which  plays  the  game  of  checkers.  The  input  to  the  tables  are 
parameters  which  evaluate  specific  aspectes  of  a  board  situation.  The  output 
from  the  table  hierarchy  is  a  number  which  represents  in  a  sense,  "figure  of 
merit"  for  the  board.  This  board  eva  uation  is  then  used  in  the  search  for  the 
best  possible  move. 

The  signature  table  scheme  has  been  extended  and  modified  to  adapt  it  for 
use  in  speech  recogn i t i on (23 .  The  tables  are  used  to  compute  the  postiriori 
probability  of  a  specific  sound  feature  such  as  voicing,  a  front  vowel  etc.  or  a 
sound  class  such  as  a  phonemic  category,  being  present.  These  probabilities  are 
used  for  classification  of  the  sound  in  Bayesian  sense  (the  actual 
implementation  makes  a  compound  decision  using  local  context).  However  this 
scheme  makes  several  implicit,  simplifying  assumptions  with  regards  to  mutual 
independence  between  sets  of  input  parameters.  This  report  describes  a  method  of 
probability  density  estimation  using  signature  tables  which  does  not  require 
independent  set  of  parameters  and  still  requires  storage  of  the  same  order. 

ihe  concept  of  signature  tables  is  best  illustrated  by  a  simplified  table 
arrangement (Fig. 1 3  similar  to  the  one  used  in  the  speech  recognition  program  (21. 
The  two-level  arrangement  has  six  inputs,  the  Fi’s  represent  the  frequencies  of 
amplitude  max i ma ( formants)  in  the  vowel  spectrum  and  the  Ai’s  are  the i . 
corresponding  amplitudes.  The  parameters  are  divided  into  two  sets  as  inputs  to 
the  two  first  level  tables.  The  outouts  from  the  first  level  tables  are  used  as 
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inputs  to  the  second  level  table.  The  input  marked  "V"  is  a  1-bit  learning 
input  and  indicates  whether  the  vowel  "V"  is  the  tag  attached  to  the  current 
sample  vector  or  it  is  not. 

The  inputs  are  quantized  into  adequate  range  of  values,  8  in  this  case.  A 
signature  table  has  an  entry  for  every  possible  combination  of  inputs.  Thus  the 
size  of  first  level  tables  is  512.  Each  entry  (one  computer  word)  is  divided 
into  three  fields.  The  field  [p]  is  incremented  by  1  if  "V"  is  indicated 
otherwise  field  tq]  is  incremented.  The  output  of  each  entry  is  computed  as: 

Pi"V"|  F1.F2.F3)  -  p/(p+q)  11.11 

which  directly  gives  the  postirinri  probability  of  the  class  "V"  for  the 
specific  entry  shown  in  Fig.l.  *his  value  is  also  quantized  to  a  prespecified 
accuracy,  say  3-bits,  and  is  stored  in  the  output  field  Irl .  The  second  level 
table  processes  its  input  in  a  like  manner.  Thus  the  output  of  the  second  level 
table  is  the  probabi I i ty 

P ("V"  |  P(F1,F2,F3),P(A1,A2,A3) )  [1.21 
whereas  the  probability  we  require  is 

P ("V" |  F1,F2.F3,A1,A2,A3)  [1.31 

Thus  while  any  one  table  generates  the  required  form  of  probability 
densityleq.  1.1),  the  two  levei  arrangement  generates  1.2,  which  is  equivalent 
to  1.3  only  if  the  parameters  in  the  sets  (Fl,F2,F3i  and  (A1,A2,A3)  are  mutually 
independent. 

The  main  objective  of  this  report  is  to  investigate  the  conditions  under 
which  the  decomposition  of  a  multidimensional  function  into  two  or  more  factors 


is  valid  so  that  a  hierarchy  of  tables  as  shown  in  the  above  example  still 
generate  a  true  higher  dimensional  probability  density  as  in  the  equation  1.3 
while  mostly  retaining  all  the  advantages  of  the  signature  table  method. 

The  three  advantages  that  emerge  from  this  method  of  traininq  as  it  has 
been  used  in  the  past  are  as  follows: 

DEssent ial  ly  arbitrary  inter-relationships  between  the  inputs  are  taken 
into  account  by  any  one  table.  The  only  loss  of  accuracy  is  in  the  quantization. 

2) The  training  is  a  simple  process  of  accumulating  counts. The  training 
samples  are  introduced  sequentially,  and  hence  simultaneous  storage  of  all  the 
samples  is  not  required. 

3) The  process  linearizes  the  storage  requirements.  The  example  shown 

(3*6) 

requires  2*512+64,  1088  entries  instead  of  2  ,  256  K  entries,  were  the 

entire  space  to  be  represented. 

Before  investigating  the  conditions  under'  which  the  decomposition  of  a 
multidimensional  space  is  valid  it  will  simplify  the  explanation  if  we  first 
consider  a  specific  example  in  qualitative  terms  only. 

Consider  the  simple  table  arrangement  as  shown  in  Fig. 2  where  we  wish  to 
take  account  of  5  input  parameters,  each  requiring  say,  3  bits  for  its 

15 

specification.  Were  we  tc  do  all  this  in  one  table  it  would  require  2  or 
32,768  entries,  instead  of  the  1024  entries  required  for  the  two  level 
arrangement  shown,  when  the  output  from  Table  1  is  also  quantized  to  3  bits. 
What  we  require  as  the  output  from  the  first  level  table  is  some  function  which 
represents  the  contribution  made  by  the  inputs  to  the  first  table  so  that  the 


output  from  the  second  table  truely  represents  the  conditional  probability  that 
CLASS  has  been  represented  by  the  specific  values  of  the  inputs.  Thus,  in  order 
to  utilize  the  Bayesian  decision  rule  we  want  to  determine 

P  t  CLASS | A , B , C , D , E ) 


uhere  ABC  D  and  E  represent  the  input  parameters.  It  is  easier  to  determine 
the  inverse  probability  during  the  training  phase  in  accordance  with  the  rule 
P (CLASS | A , B , C , D , E )  -  P ( A . B , C, □. E 1  CLASS)  *  P (CLASS)  /  P(A.B.C.D.E) . 

The  divisor  on  the  right  hand  side  appears  as  a  common  factor  in  the  conditional 
probabilities  for  all  the  classes  and  hence  need  not  betaken  into  account. 

P (CLASS) ,  the  apriori  of  a  class  may  either  be  known  for  the  recognition  problem 
under  consideration,  or  it  can  be  estimated  from  the  sample  set  used  during  the 
training.  The  remaining  factor  P ( A , B. C . D . E 1  CLASS )  is  to  be  determined  using  the 
signature  table  arrangement  shown  in  Pig. 2. 

Consider  the  expansion 

P(A.B,C,0,E|CLASSI  -  P10,E|A,B,C, CLASS)  .  P(A,B,C|CLASS) 
the  second  factor  on  r.h.s.,  the  .anginal  probabi I i ty  is  independent  of  the 
other  inputs  and  hence  given  directly  by  the  counts  accu.ulated  in  first  table 
(field  (pi  Fig. 21  with  appropriate  normal  izat  i  on  by  the  the  total  counts  in  the 
table.  He  non  focus  our  attention  on  the  first  factor,  the  conditional 
probability  PIO.EIA.B.C, CLASS)  computed  by  the  second  table.  He  observe  that  the 
input  marked  KA.B.CI  partitions  this  table  into  sections.  In  the  expanded  case 
where  Table  2  has  one  such  section  for  every  entry  in  Table  1.  there  would  have 
been  enough  of  such  sections  to  allow  for  every  combination  of  values  assignable 
to  A.B  and  C.  However  we  are  not  finally  interested  in  which  particular  points 
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in  ( A , B, C , D, E >  space  are  associated  with  each  region  (in  the  situation  where  the 
variables  are  continuous,  each  region  would  be  an  equiprobable  surface),  but 
only  the  value  of  the  probability.  Therefore  we  need  only  allow  enough  space  to 
represent  these  values  to  the  desired  accuracy(3  bits  in  this  example).  Thus  the 
outputs  stored  in  the  first  level  table  (field  [r) )  correspond  to  those  entries 
which  must  be  grouped  together  and  have  identical  value.  In  effect  what  we  are 
doing  is  to  reduce  the  dimensionality  of  the  space  represented  by  the  second 
table  by  2  (i.e.  from  (A,B,C,D,E)  to  !D,E,  f  (A, B, C) ) )  by  grouping  together  those 
points  in  (A.B.C)  space  uhich  give  the  same  overall  probability  in  (A.B,C,D,E) 
space. 

There,  is  apparent  circularity  in  this  argument,  that  is  we  must  know  the 
overall  probability  -  which  we  are  ultimately  trying  to  find  -  in  order  to 
accomplish  the  grouping  in  the  lower  dimensional  space.  However  we  shall  prove 
in  later  section  that  there  exists  a  mapping  f (A. B, C | CLASS)  of  the  function 
P(A,B,C,D,E|CLASS)  which  achieves  the  desired  grouping. 

In  a  manner  similar  to  the  one  used  to  obtain  marginal  probabilities,  the 
function  f  (A.B.C | CLASS)  is  accumulated  iteratively  in  field  [q]  in  each  table 
entry.  The  values  in  the  fields  (ql  are  then  quantized  tc  a  desired  accuracy  and 
stored  in  the  output  field  [r]  of  each  entry. 

So  far  we  have  dealt  exclusively  with  the  process  of  training,  how  the 
data  is  entered  into  the  tables.  Interpretation  of  an  unknown  sample  or 
evaluation  of  it’s  class  conditional  probability  may  be  explained  with  reference 
to  Fig. 2.  Suppose  that  the  unknown  input  is  (A’ ,B' ,C’ ,0’ ,E* ) .  The  marginal 
density  is  given  by 
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P(A\B\C*  | CLASS)  -  [p  ]  /  SIGMA  Cp  ] 

1  1 

and  the  conditional  probability  is  given  by  the  second  table  as 

P(D\E’|lr  1, CLASS)  -  tp  ]  /  SIGMA  [p  ]. 

1  2  2 

The  required  probability  is  then  obtained  as  the  product  of  these  two  factors. 

The  above  highly  simplified  example  also  indicates  an  important  property 
of  the  signature  table  method.  (During  both  the  training  and  interpretive  phasea 
a  table  derives  all  the  necessary  information  from  the  outputs  of  it’s  immediate 
predecessors.  This  fact  can  greatly  simplify  the  programs  required  for  the 
construction  t'nd  the  execution  of  complicated  signature  table  networks.  For  more 
details  of  the  programs  and  for  more  general  speech  specific  aspects  of  the 
signature  table  usage,  the  reader  is  refered  to  Samuel  [31. 

In  the  next  section  we  obtain  the  conditions  under  which  the  decomposition 
of  a  higher  dimensional  probability  density  function  into  a  product  of  two  lower 
dimensional  functions  is  valid.  In  the  following  section  we  outline  a  method  of 
estimation  of  marginal  and  conditional  probability  densities  which  occur  in  the 
decomposition.  The  signature  table  is  shown  to  be  an  approximation  of  this 
method  of  density  estimation.  The  last  section  describes  results  of  experiments 
performed  using  a  rather  small  set  of  training  samples.  The  objective  is  to 
demonstrate  the  feasibi I i ty  of  this  method,  rather  than  obtain  statistically 
valid  error  bounds,  were  this  technique  used  for  probability  density  estimation 
per  se,  rather  than  for  a  classification  or  a  recognition  task. 
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1.0  Decomposition  of  Probability  Density  Functions 


The  simplified  example  given  in  the  previous  section  made  an  implicit 

assumption  that  the  random  variables  were  discrete,  each  with  8  distinct  statas 

representable  by  3  bits.  It  is  more  illuminating  however  to  obtain  the 

conditions  for  decomposition  in  the  continuous  domain.  The  discrete  situation 

may  then  be  treated  as  a  special  case. 

Let  x  ,x  ...  , x  be  N  continuous,  non-independent  random  variables.  Let 
1  2  n 

p(x  ,x  ,..,x  | C )  define  a  class  conditional  probability  density  function  which 
1  2  n 

N 

is  continuous  everywhere  in  R  space.  The  objective  is  to  factorize  this 
function  such  ‘hat  each  factor  represents  a  function  whose  dimensionality  is 
I  ess  than  N.  Ur i te 

p(x  , X  , ,  • X  ,,,x  |C)  *  p  (x  ,.,X  |  X  ,X  ,,iX  , C ) • p ( X  ,  ,  • X  ,C)  [2.1] 
1  2  i  n  i+1  n  1  2  i  1  i 

also  let  -  p(x  ,..x  |  fix  ,..x  ),C).p(x  ,..x  , C)  [2.2] 

i+1  n  1  i  1  i 

Consider  the  factorization  in  equation  2.1.  The  second  factor,  the 

marginal  density  has  a  dimensionality  1  which  is  less  than  N.  But  the  first 

factor,  although  a  conditional,  has  implicit  dimensionality  of  N.  In  equation 

2.2  we  have  grouped  together  the  conditioning  variables  (x  ,..x  )  by  as  yet 

1  i 

undefined  functional  f,  to  give  a  mapping  of  I  dimensions  to  1.  So  the 
dimensionality  of  this  factor  is  essentially  (N-I+l). 

Now  define 
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or  for  notations  I  convenience, 

«  d (x  , x  , . , x  | C)  [2.3] 

1  2  i 

called  the  degenerate  probability  density  function  with  the  variables  x 

i+1 

through  x  set  to  zero,  or  any  other  convenient,  arbitrary  set  of  constants, 
n 

With  this  definition  of  the  function  f  we  shall  show  that  equations  2.1.  and  2.2 
are  equivalent. 

N 

Consider  a  partition  in  the  R  space  generated  by  setting 

p(x  ,x  ,...x  j  C)  *  K  where  0<K<1  [2.4] 

1  2  n 

and  denote  the  set  of  all  solution  vectors  which  satisfy  2.4  by  S  (X  )  where  X 

N  N  N 

is  a  N-vector.  Also  let 

d (x  , x  .... x  | 0  *  K 

1  2  i 

and  denote  the  solution  set  of  this  equation  by  S  (X  ).  Clearly  from  the 

I  I 

definition  of  the  degenerate  function  2.3, 

for  any  \X  c  S  )  o  ( IX  >«X  ]  c  S  )  [2.5] 

II  INN 

with  the  proviso  that  only  the  first  I  terms  in  the  vectors  X  and  X  need 

I  N 

match. 

Now  consider  within  this  partition  (defined  by  2.4),  the  factor  which 
determ  nes  conditional  density  in  equation  2.2,  namely, 
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p(><  ,..x  |  d(x  ,..x  ) . C )  =  p(x  ,..x  |  (X  (  S  ),C) 

i+1  n  1  i  i+1  n  I  I 

*  p(x  ,..x  ]  (X  i  S  ).C)  from  2.5 

i+1  n  N  N 

*  p  ( x  ,..x  |  x  ,x  ..x  ,C) 

i+1  n  12  i 

Since  other  terms  in  the  ec]U3tions  2.1  3nd  2*2  3re  identical  3nd  the 
condition  2.5  holds  true  for  the  complete  range  of  values  K  (in  2.4)  may  take, 
the  required  equivalence  holds  in  general. 

To  summarize  the  result  obtained  in  this  section,  wo  have  shown  that  it  is 
possible  to  factorize  a  N  dimensional  probabi li ty  densi  ty  function  into  a  1 
dimensional  marginal  and  a  (N-i+1)  dimensional  conditional  probabi I i ty  densi ty 
function.  The  explicit  dimensionality  of  the  productfeq.  2.2)  is  either  I  or  (N- 
i+1),  which  ever  is  greater.  The  savings  in  the  storage  requirement  can 
therefore  be  quite  significant. 


3.0  Estimation  of  Probability  Density 


In  the  previous  section  ue  have  obtained  the  condi  tiona  for  the 
decomposition  which  requires  the  estimation  of  three  different  functions: 

1)  the  marginal  density  p(x  ,x  ,...x  ), 

1  2  i 

2)  the  degenerate  density  d(x  ,x  ,..,x  ), 

1  2  i 


3 


and  3)  the  conditional  density  p(x  ,..x  |  d(x  ,x  ,..x  )). 

i+1  n  i  2  i 

Further  i-ie  require  that  the  training  algorithm  to  be  used  for  the 
estimation  be  iterative  so  as  to  admit  one  sample  at  a  time.  The  non-parametr  i  c 
method  of  density  estimation  uhich  uses  superposed  potential  functions  appears 
to  be  the  best  candidate  wnich  satisfies  all  these  conditions.  In  its  most 
general  form,  the  density  is  estimated  by  the  summation 

fl 

p  (X)  =  1/n  *  SIGMA  psi (X.X  )  [3.1] 

M  m=l  m 

th 

where  X  is  a  random  vector  variable,  X  is  the  m  sample,  fl  the  total  number  of 

m 

samples  available  and  psi  is  any  one  of  the  admissible  potential  or  kernel 

functions.  Parzen(4]  has  obtained  the  conditions  on  psi  under  uhich  3.1  bee.  Ties 

a  valid  probability  density  in  one  dimension.  Murthyt5]  has  generalized  these 

conditions  for  N  dimensions.  A  concise  description  of  various  admissible  forms 

of  psi  and  the  related  conditions  may  be  found  in  Andreus [6] ,  Sec.  4.3. 

Ue  shall  use  the  Gaussian  kernel 

2  -N/2  t  2 

psi  (X.X  )  »  (2n  a  )  *  exp(-[X-X  i  * tX-X  ]  /  2o  ) 

m  m  m 

»  K  #  exp(. ) 

which  was  f  irst  proposed  by  Sebestyent73  and  also  used  later  by  Specht  [81  to 
obtain  trainable  polynomial  discriminant  functions.  The  attractive  property  of 
this  kernel  is  that  the  '-onslant  a  may  be  chosen  so  as  to  produce  required 
smoothness  in  the  generated  density.  Small  values  cf  a  cause  each  sample  to 

10 


stand  jut  in  the  summation  3.1,  whereas  I  'irger  values  of  a  give  a  smoother 
surface.  The  iterative  form  of  the  summation  using  the  Gaussian  kernel  is 
simply, 

p  (X)  =■  (M-l)/n  *  p  (X)  +  K/n  *  exp(.) 
fl  tt-1 

The  marginal  densities  can  use  this  form  directly.  The  degenerate 
densities  become 

d  (x  ,x  ,..<  )  =  p  (x  ,x  ,..x  ,x  *0, • . . , x  «0) 

M  1  2  i  n  1  2  i  i+1  n 

The  estimation  of  the  density  which  is  conditional  to  the  degenerate  i9 

clearly  a  second  level  process,  in  the  sense  that  it  presumes  a  stabilized, 

consistent  degenerate  estimate.  It  also  implies  that  we  need  to  estimate  a 

density  of  the  form  p(x  ,..x  )  for  every  possible  value  the  degenerate  may 

i+1  n 

take.  It  appears  therefore  that  evaluation  of  the  conditional  factor  in  the 
continuous  domain  is  infeasible.  Thus  we  must  quantize  the  range  of  values  of 
the  degenerate  generated  by  the  training  process  at  the  first  level  and  then 
obtair  the  conditional  densities  for  each  of  these  values  at  the  second  level. 


3.1  Pragmatic  Considerations 


The  preceding  discussion  mag  give  an  impression  that  the  fact  that  we  have 
achieved  a  decomposition  which  leads  to  a  simpler  estimation  problem  (as  a 

11 


reduction  in  dimensionality)  is  largly  illusiory.  Reason  being  that  .he 
estimation  of  the  conditionals  hides  an  inherently  higher  dimensional  problem. 
The  argument  is  certainly  valid  in  continuous  domain.  However  every  practical 
problem  involves  a  discretization  at  some  stage,  At  best  it  would  be  based  on 
the  signal  to  noise  ratio  in  the  measurement  ano  at  the  worst  a  more  crude 
quantization  dictated  uy  available  resources.  Now,  our  claim  is  that  tne  method 
outlined  first  simplifies  the  problem  by  decomposi t ion  and  then  gives  you  a 
control  over  the  error  which  will  be  propogated  to  the  higher  level:  merely  by 
quantization  of  the  degenerate  to  a  required  accuracy.  In  the  next  section  where 
the  signature  table  method  of  estimating  the  density  is  described,  we  outline 
possible  modes  that  can  be  used  to  quantize  the  degenerate.  These  modes  of 
quantization  appear  reasonable  for  the  problem  at  hand,  namely  mu  1 1 i category 
pat  tern  recogni t ion. 

4.0  Estimation  Using  Signature  Tables 

A  signature  table  assumes  explicit  quantization  (not  necessarily  uniform) 
of  the  inputs.  There  is  a  unique  entry  or  a  s i gnature- type  in  a  table  for  all 
the  possible  combinations  generated  by  quantized  inputs.  Therefore  any  function 
which  is  to  be  evaluated  by  a  table  is  known  only  at  those  points. 

First  consider  the  estimation  of  a  marginal  probability  density  for  say,  3 
variables.  Using  the  iterative  formulation  3.4, 

p  (x  ,x  ,x  )  .  (fi-n/n  *  p  (x  ,x  ,x  )  +  x/n  * 
n  i  2  3  n-i  i  2  3 

m2  m2  m2  2 
expl-[(x  -x  )  +(x  -x  )  + ( x  -x  )  ) /2a  ) 

1  1  2  2  3  3 
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2  -3/2  f:  m  m  th 

uhere  K=(2na  )  for  N=3  and  (x  ,x  ,x  )  is  the  m  training  sample.  Now 

1  2  3 

i  j  k 

consider  an  arbitrary  table  entry  (x  ,x  ,x  ).  The  factor  (tl-l/fl)  which 

1  2  3 

normalizes  the  accumulated  density  with  fl-1  samples  can  be  neglected  even  for 

th 

moderate  values  of  N.  Thus  the  new  density  after  M  sample  is  obtained  by 
adding  the  increment 

i  m  2  j  m  2  k  m  2  2 

K/H  *  expl-[(x  -x  )  +(x  -x  )  + (x  -x  }  ) /2a  I  [4.1] 

1  1  2  2  3  3 

to  the  count  stored  for  this  entry.  Apparently  the  above  increment  must  be 
computed  and  added  for  each  entry  in  the  table.  Houever  since  the  Gaussian 
kernel  decays  exponent ial  ly  the  number  of  table  entries  for  which  the  increment 
is  significant  can  be  small.  The  entry  for  which  the  increment  is  maximum  is 
given  by 

i  m  j  m  km 

<  m  i  n  |  k  -x  | .  ni  i  n  |  x  -x  |,min|x  -x  j) 

1  1  2  2  3  3 

where  i,j  and  k  are  varied  over  the  respective  range  of  quantization.  Other 
entries  for  which  the  increment  might  be  significant  are  the  neighboring 
entries.  Thus  the  search  for  the  entries  which  must  be  modified  is 
s  tra i ght  forward. 

Estimation  of  the  degenerate  density  is  done  in  analogous  manner.  The 
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otne~  more  convenient  constant  and  do 


degenerate  variables  are  set  tc  zero  or 
not  figure  in  location  of  an  entry.  Thus  if  were  a  degenerate  in  the  previous 

•  j 

example  then  the  increment  for  entry  (x  ,x  )  would  be 

1  2 

m2  2  i  m  2  j  m  2  2 

K/n  *  exp  I-  (x  )  /2a  )  *  expt-Kx  -x  )  +(x  -x  )  1  /2a  )  14.21 

3  112  2 

and  the  degenerate  contribution  factor  would  be  same  for  all  the  entries. 

In  the  foregoing  analysis  we  have  assumed  that  all  the  variables  are 
continuous.  However  if  some  variables  are  inherently  discrete  or  can  be 
reasonably  assumed  to  be  so.  then  the  computational  requirements  may  be  reduced 
considerably.  The  difference  terms  in  the  exponential  factor  become  zero  for  the 
discrete  variables.  If  all  the  variable  are  discrete  then  the  maximum  increment 
becomes  K/M,  and  the  increments  for  the  immediate  neighbors  may  be  obtained  by 
efficient  table  look  up  procedures. 

The  third  type  of  density  tu  be  estimated,  the  conditional  has  a  general 
form  (dropping  u.<?  indices  used  in  the  decomposition) 

p ( x  , x  ..  |  dly  ,y  ..  ))• 

12  12 

Assume  that  previous  training  gives  consistent  estimates  of  d(y^,y^. •'  and  we 

have  "suitably"  quantized  the  range  of  d  into  Q  intervals.  Thus  the  value  of  Q 
is  also  stable  and  consistent.  Now  the  required  conditional  density  is  obtained 
by  a  set  of  Q  second  <evel  tables.  Each  table  in  effect  estimates  a  separate 

marginal  density  function,  or 
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j  -  1  to  Q, 


i 

p  (x  ,  X  ,  X  ....  I  j) 

n  i  2  3 

where  the  j  merely  acts  as  a  switch  to  choose  one  of  the  Q  second  level  tables. 


4.1  Quantization  of  the  Degenerate  Density 


» 


> 


> 


» 


I 


Appropriate  quantization  of  the  degenerate  density  turns  out  to  be  the 
focal  issue  in  this  approach.  The  degree  of  quantization  determines  the  trade¬ 
off  between  savings  in  storageiand  computation)  against  the  required  accuracy. 
At  the  risk  of  being  repetitious  we  may  say  that  if  the  degenerate  is  not 
quantized  at  all  then  the  storage  required  is  same  as  for  a  large,  single,  one 
level  table.  Whereas  fewer  the  intervals  into  which  the  degenerate  is  quantized, 
more  is  the  reduction  in  the  storage. 

There  are  two  possible  approaches  to  the  quantization  problem.  We  may 
treat  the  estimation  of  a  class  conditional  probability  independently  of  other 
classes  or  ue  may  cross  reference  between  classes  ^.v  louer  levels  in  the 
signature  table  hierarchy.  First  consider  quantization  of  a  degenerate 
independently  of  other  classes.  Also  assume  that  we  are  resource  bound  and  the 
number  of  intervals  into  which  it  must  be  divided  (Q)  is  prespecified. 

Division  of  the  degenerate  range  in  Q  equal  intervals  may  be  ruled  out. 
This  would  mean  that  intervals  reflecting  lower  density  would  have  very  few 
samples  at  the  next  level.  Q  intervals  spaced  equally  over  logarithm  of  the 
degenerate  will  give  more  equitable  distribution  of  samples  at  the  next  level. 
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Though  this  uou  not  be  optimal  if  the  underlying  multivariate  density  does  not 
have  some  exponent ' a  I  form,  or  if  it  is  multimodal. 

The  quant  i ration  may  be  made  dependent  upon  the  data  accumulated  in  the 
table  itself.  Each  interval  is  chosen  such  that  the  integral  of  the  associated 
marginal  density  over  that  interval  isl/Q.  Computat  iona  I  ly  thi  s  involves!) 
ordering  the  accumulated  degenerate  values  and  2)  summing  up  the  corresponding 
marginal  values  until  the  sum  equals  1/Q  of  the  total  and  placing  the  interval 
boundary  at  this  point.  The  process  is  t  i  me  consuming  as  i  t  requires  sorting 
the  degenerate  values  in  a  table.  However  this  quantization  need  only  be  done 
after  sufficiently  large  number  of  samples  have  been  processed  so  as  to  give 
stable  quantization  boundaries.  This  method  of  degenerate  quantization  was  used 
for  the  recognition  experiment  reported  in  the  next  section. 

The  quantization  may  be  made  error  bound  i  f  Q  is  not  prespecified.  It 
uou  I  d  also  involve  a  sort  of  the  degenerate  values.  The  interval  boundaries  may 

then  be  located  such  that  every  degenerate  value  is  uithin  the  specified  error 
from  the  nearest  boundary. 


The  quantization  methods  discussed  so  far  attempt  to  ge 


v  iw  l  pu  . 


multivariate  density  estimate  for  a  single  class.  With  simultaneous  quantisation 
of  the  appropriate  degenerates  of  all  the  class  categories  to  be  recognized,  it 

may  be  possible  to  minimize  the  misclassi  f  ication  rate  and  also  minimize  the 
storage  requirement. 


First  consider  a  simple  case  with  only  two  classes.  The  quantization 
boundaries  should  be  placed  at  a  value  where  the  tuo  degenerate  functions  are 
equal.  The  approximate  boundry  location  may  be  found  by  comparing  tne  ordered 
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sets  of  degenerate  values.  Clearly,  the  mi scl assi f icat ion  rate  is  given  by  the 
sum  of  all  the  marginal  densities  for  which  the  the  degenerates  are  below  this 
value.  The  exact  boundry  value  may  then  be  chosen  so  as  to  minimize  these  sums. 
The  implication  o'  following  such  a  procedure  in  a  two  class  situation  is  that 
we  have  located  the  optimum  discriminant  boundry,  and  1  bit  quantization  (Q»2) 
of  the  degenerate  is  sufficient. 

In  a  mu  1 1 i -category  situation  a  similar  procedure  involves  simultaneous 
location  of  the  discriminant  boundaries  between  all  the  possible  pairs  of  class 
categories  so  as  to  minimize  the  overall  misclassi  ricat  ion  rate.  A  sub-optimal 
approach  is  possible  which  avoides  the  minimization  problem.  For  a  given  class 
locate  all  the  cross-over  points  where  the  degenerates  of  all  other  classes  are 
nearly  equal  the  degenerate  of  this  class  and  provide  a  quantization  with 
maximum  resolution  in  the  cross-over  range. 

This  approach  to  quantization  with  cross  reference  between  the  classes  to 
be  recognized,  would  certainly  produce  near  optimal  results.  However  it  would 
tend  to  be  highly  sensitive  to  the  stationarity  of  the  underlying  probability 
distributions.  One  bad  category  whose  probability  distribution  changes  with  time 
could  drastically  alter  the  overall  performance  of  the  system.  However  if  each 
class  conditional  probability  density  is  obtained  independently  of  the  other 
classes,  then  a  drifting  category  would  affect  only  the  nearby  categories  and 
could  easily  be  identified. 

5.  Experiments 


ExDeriments  in  vowel  recognition  illustrate  the  application  of  the 
signature  table  method  of  probability  densi ty  estimation  discussed  above.  The 


data  used  for  the  experiments  is  derived  from  51  words  spoken  by  one  speaker. 
Vowel  data  extracted  from  2S  words  is  used  for  training  the  tables.  Data  derived 
from  the  remaining  25  words  is  used  for  recognition. 

The  speech  is  digitized  with  a  sampling  rate  of  20  kHz  and  12-bit 
quantization.  Frequency  domain  representat ion  of  the  speech  is  obtained  by 
taking  2SG  sample  FFTU2.8  msec.)  with  128  sample  overlap  between  successive 
FFTs.  A  set  of  19  parameters,  such  as  three  major  peaks  in  specified  frequency 
ranges  and  their  corresponding  amplitudes,  energies  average)  in  certain 
f  jquency  regions  etc.  are  obtain’d  by  measurements  on  each  FFT.  However,  in 
the  following  limited  experiments  we  have  used  only  the  three  vowel  formants  and 
their  amplitudes,  since  these  six  parameters  art  known  to  be  significant  for 
vowel  perception. 

Each  parameter  measurement  is  scaled  and  quantized  to  a  G-bit  value  in  the 
first  instance.  For  more  details  of  the  parameterization,  the  reader  is  refered 
to  (2).  Every  parameter  vector,  one  every  6.4  msec,  of  speech,  is  given  an 
appropriate  tag  according  to  the  vowel  category  to  which  it  belongs.  This 
labeling  is  done  by  visual  inspection  of  the  speech  wwveform.  The  mnemonics  that 
have  been  used  to  identify  the  12  vowel  categories  are  given  in  Table  1. 

Three  types  of  analysis  are  performed  to  give  a  comparative  evaluation  of 
the  signature  table  method. 

1)  A  nearest-ne ighbor  analysis  (9)  is  done  using  the  full  G-bit  range  of 
the  six  parameters.  The  results  of  this  analysis  serve  as  a  guide  for  the 
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interpretation  of  other  results  and  also  give  a  feel  for  the  inherent  overlap  in 
the  data. 

2)  The  six  parameters  are  assumed  to  be  independent  of  each  other.  The 
quantization  is  reduced  from  6-bits  to  3-bits  and  a  single  input  signature  table 
is  used  to  find  the  class  conditional  probability  for  a  parameter  for  each 
category.  Thus,  thir  method  requires  72  tables  each  having  8  entries,  a  totai 
storage  of  576  words.  The  six  dimensional  joint  probability  for  a  category  is 
obtained  as  ths  product  of  the  six  individual  probabilities. 

3)  The  probability  density  estimation  method  outlined  in  this  report  gives 
the  third  set  of  results.  The  implementation  detoils  and  the  approximations  used 
therein,  are  given  in  the  next  section, 

5.1  Implementat ion  Detai  Is 


The  block  schematic  of  the  five  level  eignature  table  arrangement  used  to 
generate  the  required  6  variable  density  is  shown  in  Fig, 3.  All  the  inputs  are 
quantized  to  3  bits.  The  size  of  each  table  is  thus  S4  words,  The  total  storage 
required  is  therefore  5*64*12,  or  3840  words. 

Since  all  the  inputs  are  discrete,  the  exponential  factor  in  the 
evaluation  of  a  marginal  density  (eq.  4.1)  becomes  1.  The  multiplicative  factor 

4 

K/fl  ensures  convergence  of  the  estimate  for  large  values  of  M.  But  it  also  has 

1 

the  effect  of  ueighing  down  the  the  samples  which  come  later  in  the  training 
sequence.  Si  nee  the  maximum  number  of  training  samples  for  a  category  in  the 


present  experiment  was  only  86,  all  the  samples  were  given  equal  weight. 
Therefore  the  increment  used  to  to  generate  the  marginal  density  is  1. 

The  evaluation  of  a  degenerate  increment  involves  an  exponential 
factor  (eq.  4.2).  for  example,  the  first  table  in  Fig. 3  has  four  degenerate 
inputs.  Therefore  this  factor  uuuld  be 

2  2  2  2  2 

exp(-(Al  +A2  +F1  +F2  ) /2#a  ) 

The  quantization  methods  discussed  in  Sec.  4.1  show  that  the  absolute 
value  of  the  degenerate  is  unirportant  so  long  as  the  relative  ordering  of  the 
entries  within  a  table  is  maintained.  The  exponential  m:  :i  therefore  be 

approximated  by  using  i_nly  the  first  term  in  the  expansion  for  each  input,  i.e. 

2  2  2  2 

(1. - (Al/7)  )*(l.-(A2/7)  )*(1.- (Fl/7)  )*(l.-(F2/7)  ). 

Also  the  contribution  of  a  training  sample  to  its  neighbors  is  assumed  to 
be  negligible  as  the  quantization  itself  is  rather  gross.  The  factor  a,  which 
determines  the  smoothness  of  the  distribution  is  also  neglected. 

Now  consider  how  the  six  term  conditional  probability  is  obtained  using 
the  various  marginal  densities  computed  by  the  lower  level  tables  in  Fig. 3.  In 
the  factorization 

p(Fl,F2.F3,Al,A2.A3)  -  p(F2|  d(Fl.F3.Al.A2.A3,')*p(Fl,F3,Al,A2,A3) 
the  first  factor  on  the  r.h.s.  is  given  by  the  marginal  output  of  Table  5.  The 

second  factor  is  the  five  term  marginal  output  of  the  Table  4,  which  in  turn  has 

been  obtained  using  a  similar  factorization.  The  configuration  given  in  Fig. 3  is 
repeated  for  al I  the  12  categories. 

As  the  total  number  of  training  samples,  738,  is  rather  small  and  each 
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level  in  the  tab  I  e  hierarchy  must  he  some  what  stable  before  meaningful  training 
of  the  next  higher  level  can  be  done,  the  same  data  is  used  in  five  passes  over 
the  set  of  tables.  After  each  pass  the  next  higher  level  tables  are  enabled,  90 
that  accumulation  of  counts  in  appropriate  entries  can  be  started. 


5.2  Resul ts 


Exactly  the  same  data  as  used  for  the  training  is  used  for  the 
^  classification  experiment.  The  objective  is  to  determine  the  error  introduced  by 

the  gross  quantization  and  the  hierarchic  organization  of  the  signature  tables. 
Clearly,  the  nearest-neighbor  procedure  would  give  100%  correct  classification. 
•  Therefore,  scatter  matrix  shown  in  Table  2  is  produced  by  finding  the  sample 

vector  which  is  closest  without  giving  an  exact  match.  The  overall  correct 
figure  of  65.9%  indicates  that  on  the  average  34%  of  the  training  samples  have  a 
|  neighbor  of  a  different  kind.  The  classification  result  using  the  the  present 

method  (74.3%,  Table  3)  shows  that  even  with  3-bit  quantization,  the  probability 
estimate  does  improve  the  classification.  A',  expected,  the  results  obtained  when 
I  independence  io  assumed  are  poorer  (58. 1%,  Table  4).  The  vowels  which  are  weakly 

articulated  (AS,  I ,  A,  U)  have  more  variability  in  the  data  and  tend  to  get  swamped 
by  the  stronger  vowels. 

The  recognition  results  obtained  with  unknown  samples  which  have  been 
extracted  from  25  uords,  are  given  in  Tables  5,6  and  7.  The  reader  is  probably 


appalled  by  the  low  overall  recognition  rates.  This  "bad"  example  has  been 
chosen  with  a  purpose.  None  of  the  vowels  used  in  the  training  set  occur  in 
exactly  the  same  phc.ie.Tic  context  as  in  the  recognition  set.  Because  of  the  high 
context  sensitivity  or  sone  of  the  vowels,  in  particular  AS,1,A,AR  and  U  most 
of  the  unknown  samples  tend  to  fall  in  the  empty  space  between  the  "learned" 
categories.  The  result  of  the  nearest-neighbor  analysis  is  34.52  (Table  5) 
compared  with  the  signature  table  method  result  of  32.92  (Table  6) .  The 
signature  tables  thus  have  a  comparable  performance  ,  even  with  3-bit 
quantization.  The  apparently  superior  performance  obtained  when  independence 
between  the  inputs  is  assumed  (36.12),  is  at  the  expense  of  the  weaker 
categories  as  seen  in  Table  7. 

However,  to  get  back  to  thr-  main  purpose  of  choosing  this  particular 
example,  the  signature  table  method  allows  us  to  defer  making  a  decision  until  a 
wider  context  has  been  analyzed.  A  compound  decision  can  then  be  made  using  this 
contextual  information.  If  one  is  allowed  to  consider  the  second  choices  in  the 
above  example,  even  when  no  context  is  taken  into  account,  the  recognition  score 
increases  to  48.82.  a  rise  of  162  as  shown  in  Table  8. 

It  is  also  obvious  that  the  context  sensitivity  in  this  data  set  is  some 
what  contrived,  1)  by  rather  arbitrary  division  of  the  list  of  51  words  into  two 
sets,  ad  2)  by  using  only  738  samples  for  training  against  the  465  used  in 
recognition.  Clearly,  in  actual  usage  the  training  set  would  have  an  order  of 
magnitude  more  samples  and  also  those  would  be  derived  from  a  more 
representative  context. 
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6.  Conclusions 


Ue  have  shown  that  it  is  possible  to  decompose  a  higher  dimensional 
probability  density  function  into  two  factors  whose  dimensionality  is  less  than 
the  original  function.  The  decomposition  is  iterative  and  hence  the 
dimensionality  of  the  functions  which  are  actually  evaluated  can  reduced  to  any 
desired  order.  However  the  errors  are  propogated  in  each  iteration  and  savings 
in  storage  accrued  must  be  balanced  against  the  desired  accuracy. 

The  signature  table  method  of  training  is  shown  to  be  effective  for  the 
estimation  o*  the  various  probabi I i ty  densi ty  funct  i ons  ;ihi  ch  arise  as  the 
result  of  the  decomposition.  The  signature  table  method  1)  does  not  require 
assumptions  regarding  the  underlying  probability  distribution,  2)  allows 
sequential  introduction  of  the  training  samples,  and  3)  is  very  efficient:  the 
estimation  process  reduces  to  simple  counting  when  the  input  variables  are 
d i screte. 

The  disadvantages  of  the  method  are  1)  the  errors  tend  to  propagate  from 
one  level  to  the  next  in  the  table  hierarchy  and,  2)  the  number  of  training 
samples  required  grows  in  proportion  to  the  number  of  levels,  so  as  to  ensure 
overall  convergence  of  the  final  estimate. 

The  first  set  of  classification  experiments  show  that  even  with  possibly 
the  uorst  decomposition,  where  a  six  dimensional  density  is  decomposed  into  5 
levels  uith  2  inputs  each,  and  with  a  reduced  3-bit  quantization,  the  signature 
table  method  has  a  better  performance  than  the  corresponding  nearest-neighbor 
and  independence-assumption  experiments. 
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The  second  set  of  recognition  experiments  is  some  what  art1 ! icial  ly 
contrived  to  highlight  the  application  of  this  method  to  speech  recognition  and 
other  similar  applications  where  high  degree  of  context  sensitivity  makes  it 
imperative  to  have  some  confidence  estimate  of  the  purely  local  decision.  A 
compound  decision  can  then  be  made  using  these  local  estimates.  Ofcourse,  the 
basic  assumption  here  is  that  the  combinetor ics  of  the  problem  rule  out  a 
compound  decision  which  is  based  on  the  basic  feature  measurements  over  all  of 
the  desired  context. 
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Table  1 
512 

entr i es 
P  I  q  I  f 


Table  2 
512 

entr i es 


P("V"|  F1.F2.F3)  -  p/(;'+q)  -*[rl 


Table  3 
64 

entr i es 


Fig.l  A  Simplified  Example  of  a  Two-level  Signature  Table 

Arrangement  Used  in  the  Speech  Recognition  Program  [2] . 
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Fig. 2.  A  Simplified  Tuo-level  Signature  Table  Arrangement 

Specific  values  of  of  the  inputs  say,  A’,B’  and  C’,  uihen  concatenated  form 

an  address  which  points  to  the  entry  shown.  During  training  if  CLASS  is 

indicated  for  this  input  then  the  column  tp  ]  is  incremented  by  1  and  column 

1 

[q  ]  is  incremented  by  the  function  f.  The  table  outputs  in  column  tr  )  are 

1  1 

obtained  by  quantization  of  the  values  in  [q  ).  The  probability  calculation 

1 

performed  during  the  recognition  phase  is  as  shown  in  the  figure. 
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Fig. 3  A  Six-input,  Five-level  Signature  Table  Arrangement 

Used  in  the  Experiments.  A  — >  Indicates  a  Degenerate  Input. 
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m 

m 

beet 

AE  :  bat 

E  : 

bai  t 

i  : 

bi  t 

AS  : 

a.  but 

AA  :  bar 

AR  : 

bird 

A  : 

but 

AU  : 

bought 

00  :  boot 

0  s 

boat 

U  s 

book 

Table  1.  Vowel  Mnemonics  Used  in  the  Experiments 


G  i  ven 


EE  AE  E  I  AS  AA  AR  A  00  U  0 

F  EE  60  2  1  5  1  10  5  3 

o  AE  3  62  6  4  1 

u  E  3  5  52  7  2  1  1  2 

n  I  52?  31  2  5  6 

d  AS  22  3  21  113445 

AA  4  1  21  4  6 

AR  3  4  6  1  8  50  5  4  5 

A  1  1  1  5  43  4  3  3 

00  1  3  1  2  56  2  2 

U  2  1823362  28  6 

0  2  6  1  7  3  2  8  57 

Total  84  75  66  67  38  35  86  74  68  67  78 

%Found  71  83  79  46  55  60  58  66  82  42  73 

Overall  correct  65.98  % 

Table  2.  Classification  Result  of  Nearest-Neighbor  Analys 


Given 

EE  AE  E  I  AS  AA  AR  A  00  U  0 

F  EE  51  1  1  1  1 

o  AE  4  54  3  3  1  1 

u  E  6  1  53  2  1  51  2 

n  I  4  7  4  46 

d  AS  4129  29  7  275 

AA  352  4  33  66  4 

AR  1  1  45  1  11 

A  2  1  5  59  1  2 

00  5  1  4  1  5  66  2 

U  2  2  1  2  3  44  2 

0  2  4  2  12  3  5  68 

Total  84  75  66  67  38  35  86  74  68  67  78 

•XFound  61  72  80  69  76  94  52  80  97  66  87 

Overall  correct  74.26  % 

Table  3.  Classification  Result  Using  Signature  Tables 
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G  i  ven 


EE  AE  E  I  AS  AA  AR  A  00  U  0 

F  EE  50  4  4  12  4  8  1  2  4  3 

o  AE  "  S3  11  9  1  1 

u  E  4  45  4  4  6  2 

n  I  1  1  2  25  7  1  81 

d  AA  2 

AR  18  3  3  9  12  17  SS  11  8  8  9 

A  21  1  1  2  38  2  1  1 

00  2  2  1  10  52  3 

U  16  12  1  2  24  3 

0  3  9  4  9  11  18  64 

Total  84  75  56  67  38  35  86  74  68  67  78 

XFound  60  84  68  37  0  6  77  51  76  36  82 

Overall  correct  58.13  X 

Table  4.  Classification  Result  with  Independence  Assumption 
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EE 
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16 
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7 
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9 
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3 
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2 
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27 

1 

8 

A 

2 

1 

2 

3 

8 

00 

4 

1 

U 

2 

13 

4 

3 

1 

4 

5 

AW 

0 

8 

1 

2 

32 

Total 

65 

23 

61 

75 

43 

43 

71 

18 

66 

XFound 

65 

70 

26 

17 

14 

21 

38 

0 

48 

Overall  correct  34.62  X 

Table  5.  Recognition  Result  of  Nearest-Neighbor  Analysis 
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G  i  ven 

EE 

AE 

E 

I 

AS 

AA 

AR 

AU 
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F 

EE 

31 

1 

4 

3 

1 

0 

AE 

c 

9 

7 

1G 

5 

6 

3 
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E 

3 

4 

28 

12 

O 

L. 

9 

n 

I 

8 

2 

12 

3 

1 

1 

8 

3 

d 

AS 

2 

4 

11 

24 

14 

5 

29 

8 

AA 

3 

1 

2 

2 

4 

23 

1 

9 

9 

AR 

1 

1 

2 

7 

4 

A 

2 

1 

2 

2 

2 

4 

00 

G 

4 

1 

9 

U 

1 

2 

1 

5 

1 

4 

1 

5 

0 

4 

1 

2 

8 

1 

5 

33 

Total 

GS 

23 

G1 

75 

43 

43 

71 

18 

GG 

%Found 

48 

39 

4G 

11 

33 

53 

10 

0 

50 

Overall  correct  32.9  % 


£  Table  G.  Recognition  Result  Using  Signa'ure  Tables 


Gi ' 

k'en 

EE 

AE 

E 

I 

AS 

AA 

AR 

AU 

0 

F 

EE 

39 

1 

7 

14 

3 

11 

1 

G 

2 

0 

AE 

7 

21 

1G 

29 

G 

5 

1 

1 

u 

E 

1 

1 

23 
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7 

3 

5 

n 

I 

3 

7 

10 

3 

1 

19 
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AR 

14 

8 

4 

9 

21 

32 

5 

15 
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1 

2 
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00 
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1 

4 
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8 

1 

7 

1 

0 

7 

5 

43 

Total 

G5 

23 

Gl 

75 

43 

43 

71 

18 

66 

%Found 

G0 

91 

38 

13 

0 

0 

45 

0 

in 

CD 

Overall  correct  3G.13  % 

Table  7.  Recognition  Result  with  Independence  Assumption 
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G  i  ven 


EE 

AE 

E 

1 

AS 

AA 

AR 

AU 

0 
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EE 

39 

3 

1 

1 

4 

0 

AC 

4 

17 

7 
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8 

2 

2 
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0 
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1 

4 

1 

6 

21 

1 

2 

4 
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AS 

8 

2 

G 

17 

24 

1G 

1 

1 

AA 

1 

1 

3 

2 

29 
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o 

6 

2 

AR 

2 

5 

2 

17 

3 

A 

2 

3 

3 

2 

3 

1 

4 

00 

2 

4 

1 

2 

9 

U 

2 

G 

2 

3 

2 

3 

2 

0 

1 

1 

1 

2 

5 

1 

8 

1 

45 

Total 

65 

23 

61 

75 

43 

43 

71 

18 

GG 

XFound 

60 

74 

57 

28 

5G 

G7 

24 

0 

G8 

Overall  correct  48.82  X 

Table  8.  Recognition  Result  with  Second  Choice  Considered 
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Figure  and  Table  Cant  ions. 

1)  Fig.l  A  Simolifiea  Example  of  a  Two-level  Signature  lable 
Arrangement  Used  in  the  Speech  Recognition  Program  (2). 

2)  Fig. 2  A  Simplified  Tuo-ievel  Signature  Table  Arrangement. 

Specific  values  of  the  inputs  say,  A’,B’  and  C’,  when  concatenated  form  an 

address  which  points  to  the  entry  shown.  During  training  if  CLASS  is  indicated 

for  this  input  then  the  column  (p  ]  is  incremented  by  1  and  column  [q  ]  is 

1  1 

incremented  by  the  function  f.  The  table  outputs  in  column  (r  ]  are  obtained  bu 

1 

quantization  of  the  values  in  [q  ] .  The  probability  calculation  performed  durinq 

1 

the  recognition  phase  is  as  shoun  in  the  figure. 

3)  Fig. 3  A  Six-input,  Five-level  Signature  Table  Arrangement  Used  in 
ts.  A  ==>  indicates  a  Degenerate  Input. 

Table  1.  Vowel  Hnemonics  Used  in  the  Experiments. 

Table  2.  Classification  Result  of  Nearest-Neighbor  Analysis. 

Table  3.  Classification  Result  Using  Signature  Tables. 

Table  4.  Classification  Result  with  Independence  Assumption. 

Table  5.  Recognition  Result  of  Nearest-Neighbor  Analysis. 

Table  6.  Recognition  Result  Using  Signature  Tables. 

Table  7.  Rerognition  Result  with  Independence  Assumption. 

Table  8.  Recognition  Result  with  Second  Choice  Considered. 
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